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Abstract. On distributed memory electronic computers, the implementation and association of fast parallel 
matrix multiplication algorithms has yielded astounding results and insights. In this discourse, we use the 
tools of molecular biology to demonstrate the theoretical encoding of Strassen's fast matrix multiplication 
algorithm with DNA based on an n-moduli set in the residue number system, thereby demonstrating the 
viability of computational mathematics with DNA. As a result, a general scalable implementation of this 
model in the DNA computing paradigm is presented and can be generalized to the application of all fast 
matrix multiplication algorithms on a DNA computer. We also discuss the practical capabilities and issues 
of this scalable implementation. Fast methods of matrix computations with DNA are important because 
they also allow for the efficient implementation of other algorithms (i.e. inversion, computing determinants, 
and graph theory) with DNA. 



1. Introduction 

The multiplication of matrices is a fundamental operation applicable to a diverse range of algorithms from 
computing determinants, inverting matrices, and solving linear systems to graph theory. Indeed, Bunch and 
Hopcroft |12) successfully proved that given an algorithm for multiplying two n x n matrices in 0(n a ) opera- 
tions where 2 < a < 3, then the triangular factorization of a permutation of any n x n nonsingular matrix as 
well as its inverse can be found in 0(n a ) operations. The standard method of square matrix multiplication 
requires 2n 3 operations. Let u be the smallest number such that 0(n w+e ) multiplications suffice for all e > 0. 
Strassen |16] presented a divide-and-conquer algorithm using noncommutative multiplication to compute the 
product of two matrices (of order m2 k ) by m 3 7 k multiplications and (5 + m)m 2 7 k — 6m 2 2 2k additions. Thus, 
by recursive application of Strassen's algorithm, the product of two matrices can be computed by at most 
(4.7)n log2 7 operations. Following Strassen's work, Coppersmith and Winograd [2] were able to improve the 
exponent to 2.38. Their approaches and those of subsequent researchers rely on the same framework: For 
some fc, they devise a method to multiply matrices of order k with m <SC A; 3 multiplications and recursively 
apply this technique to show that uj < log k m. Only until recently, it was long supposed that ui could take on 
the value of 2 without much evidence. Using a group-theoretic construction, Cohn, Kleinberg, Szegedy, and 
Umans [7] rederived the Coppersmith- Winograd algorithm to describe several families of wreath product 
groups that yield nontrivial upper bounds on u, the best asymptotic result being 2.41. They also presented 
two conjectures in which either one would imply an exponent of 2. 

Unfortunately, although these improvements to Strassen's algorithm are theoretically optimal, they lack 
pragmatic value. In practice, only the Strassen algorithm is fully implemented and utilized as such: 

For even integers m, n, and k, let X £ R mxfc an d Y £ M. kxn be matrices with product Q £ R mx ™, and 

set 

Y I Xqq Xq{\ _ /Ybo ^oi\ n — ( Q°° 

\Iio XuJ* \Y W YuJ' ^~\Qio Qn 

where X ld £ K m / 2xfc / 2 , £ M. k / 2xn / 2 , and Q l0 £ R™/2xn/2_ Then perform the following to compute 
Q = XY, 

Mo := (Xqq + Xn)(Yoo + In), 
Mi := (Xio + Xxx)Yqo, 
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M 2 := X 00 {Y 01 - Y u ), 

M3 '■— -X11 ( — ^00 + ^10)) 
M 4 := (X 00 + X 01 )Y 11 , 

m 5 := (-x 00 + x 10 )(r 00 + r 01 ), 

M 6 := (X i-X 11 )(y 1 o+yii), 
Qoo =M +M 3 -M 4 + M 6 , 

Q01 = Mi + M 3 , 

Q10 = M 2 + M 4 , 

Qn = M + M 2 — Mi + M 5 . 

Even if the dimension of the matrices is not even or if the matrices are not square, it is easy to pad the 
matrices with zeros and perform the aforementioned algorithm. 

Typically, computations such as this one are performed using electronic components on a silicon substrate. 
In fact, it is a commonly held notion that most computers should follow this model. In the last decade 
however, a newer and more revolutionary form of computing has come about, known as DNA computing. 
DNA's key advantage is that it can make computers much smaller than before, while at the same time 
maintaining the capacity to store prodigious amounts of data. Since Adleman's [9] pioneering paper, DNA 
computing has become a rapidly evolving held with its primary focus on developing DNA algorithms for 
NP-complete problems. However, unlike quantum computing in recent years, the viability of computational 
mathematics on a DNA computer has not yet been fully demonstrated for the whole field of DNA-based 
computing has merged to controlling and mediating information processing for nano structures and molecular 
movements. In fact, only recently have the primitive operations in mathematics (i.e. addition, subtraction, 
multiplication, and division) been fully implemented and optimized. Thus, the general problem dealt with 
in this paper is to show that computational mathematics is feasible with DNA. Fujiwara, Matsumoto, and 
Chen 1! proved a DNA representation of binary integers using single strands and presented procedures for 
primitive mathematical operations through simple manipulations in DNA. It is important to note that the 
work of Fujiwara et al. [1 and those of subsequent researchers have relied upon a fixed-base number system. 
The fixed-base number system is a bottleneck for many algorithms as it restricts the speed at which arithmetic 
operations can be performed and increases the complexity of the algorithm. Parallel arithmetic operations 
are simply not feasible in the fixed-base number system because of the effect of a carry propagation. Recently, 
Zheng, Xu, and Li |17j have presented an improved DNA representation of an integer based on the residue 
number system (RNS) and give algorithms of arithmetic operations in Zu = {0, 1, • " ; — 1} where Zm is 
the ring of integers with respect to modulo M. Their results exploit the massive parallelism in DNA mainly 
because of the carry- free property of all arithmetic operations (except division, of course) in RNS. 

In this paper we present a parallelization method for performing Strassen's fast matrix multiplication 
methods on a DNA computer. Although DNA-based methods for the multiplication of boolean [5] and 
real-numbered matrices [B] have been proven, these approaches use digraphs and are not divide-and-conquer 
like Strassen's algorithm (and hence are not particularly efficient when used with DNA). Divide-and-conquer 
algorithms particularly benefit from the parallelism of the DNA computing paradigm because distinct sub- 
processes can be executed on different processors. The critical problem addressed in this paper is to provide 
a DNA implementation of Strassen's algorithm, while keeping in mind that in recent years it has been shown 
that the biomolecular operations suggested by the Adleman-Lipton model are not very reliable in practice. 
More specifically, the objectives we aim to accomplish in this research paper are the following: 

• To provide in §2 a revised version of the Adleman-Lipton model that better handles recursive ligation 
and overcomes the confounding of results with the complexity of tube content. 

• To establish a systematic approach in §3 of representing and adding and subtracting matrices using 
DNA in the RNS system. 

• Next, based on this representation system, we describe in §4.1 an implementation of the Cannon 
algorithm with DNA at the bottom level. 
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• And lastly, we present in §4.2 a method to store the different sub-matrices in different strands, and 
in §4.3, we prove a mathematical relation between the resultant matrix and the sub-matrices at 
recursion level r. 

Our approach uses the Cannon algorithm at the bottom level (within a tube containing a memory strand) 
and the Strassen algorithm at the top level (between memory strands). We show that the Strassen-Cannon 
algorithm decreases in complexity as the recursion level r increases [3] . If the Cannon algorithm is replaced 
by other parallel matrix multiplication algorithms at the bottom level (such as the Fox algorithm), our 
result still holds. The difficulty that arises is that in order to use the Strassen algorithm at the top level, 
we must determine the sub-matrices after the recursive execution of the Strassen formula r times and then 
find the resultant matrix. On a sequential machine, this problem is trivial; however, on a parallel machine 
this situation becomes much more arduous. Nguyen, Lavallee, and Bui [3] present a method for electronic 
computers to determine all the nodes at the unspecified level r in the execution tree of the Strassen algorithm, 
thereby allowing for the direct calculation of the resultant matrix from the sub-matrices calculated by parallel 
matrix multiplication algorithms at the bottom level. Thus, we show that this result can theoretically be 
obtained using DNA, and combined with a storage map of sub-matrices to DNA strands and with the usage 
of the Cannon algorithm at the bottom level, we have a general scalable implementation of the Strassen 
algorithm on Adlcman's DNA computer. As of the moment, we should note that this implementation is 
primarily theoretical because in practice, the Adleman-Lipton model is not always feasible, as explained in 
§5. The reason why we concentrate on the Strassen algorithm is that it offers superior performance than the 
traditional algorithm for practical matrix sizes less than 10 20 However, our methods are also applicable 
to all fast matrix multiplication algorithms on a DNA computer, as these algorithms are always in recursive 
form [15j . In addition, our results can be used to implement other algorithms such as inversion and computing 
determinants on a DNA computer since matrix multiplication is almost ubiquitous in application. 

2. Preliminary Theory 

2.1. The Residue Number System. The residue number system is defined by a set of pairwise, coprime 
moduli P = {g„_i, • • • , qo}. Furthermore, an integer in RNS is represented as a vector of residues with respect 
to the moduli set P. As a consequence of the Chinese remainder theorem, for any integer x <E [0, M — 1] 
where M = JlILo eacn RNS representation is unique. As stated by Zheng, Xu, and Li [T7], the vector 
(x n -i, ■ ■ ■ ,xq) denotes the residue representation of x. 

It has been previously mentioned that one of the important characteristic of RNS is that all arithmetic 
operations except for division are carry-free. Thus, for any two integers x — > (x„_i,-- - ,xq) £ Zm and 
U ~* (j/n— l) ' ' ' i Ho) € Zm we obtain the following from [5] 

(2-1.1) \xoy\ M -> (\x n -l °2/n-l|g n -iV A x Q°yo\qo) > 

in which o is any operation of addition, subtraction, or multiplication. 

2.2. The Adleman-Lipton Model. In this section we present a theoretical and practical basis for our al- 
gorithms. By the Adleman-Lipton model, we define a test tube T as a multi-set of (oriented) DNA sequences 
over the nucleotide alphabet {^4, G, C, T}. The following operations can be performed as follows: 

• Merge{Ti,T2): merge the contents in tube T± and tube T2, and store the results in tube T\; 

• Copy(T±, T2): make a copy of the contents in tube T\ and store the result in tube T2] 

• DetectiT): for a given tube T, this operation returns "True" if tube T contains at least one DNA 
strand, else it returns "False" ; 

• SeparationiTi, A, T2): from all the DNA strands in tube T\, take out only those containing the 
sequences of X over the alphabet {^4, G, C, T} and place them in tube T2; 

• Selecbion(Ti,l,T2): remove all strands of length / from tube T\ into tube T2; 
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• Cleavage(T, tjQai): given a tube T and a sequence <Jq(T\, for every strand containing 
the cleavage operation can be performed as such: 

Cleavage(T,ao <J\ ] 



<7 C7i 



(T (Tl 



then 



aocr <Ti/3o 











J 





where the overhead bar denotes the complementary strand. 

• Annealing(T): produce all feasible double strands in tube T and store the results in tube T (the 
assumption here is that ligation is executed after annealing); 

• Denaturation(T): disassociate every double strand in tube T into two single strands and store the 
results in tube T; 

• Empty (T): empty tube T. 

According to [5], the complexity of each of the aforementioned operations is 0(1). 



2.3. Revised Adleman-Lipton Model through Ligation by Selection. In practice, the recursive prop- 
erties of our implementation of the Strassen-Canon algorithm require a massive ligation step that is not fea- 
sible. The reason is that, in practice, the biomolecular operations suggested by the Adleman-Lipton model 
are not completely reliable. This ligation step cannot produce longer molecules as required by our imple- 
mentation, and certainly not more than 10-15 ligations in a row. Not to mention that both the complexity of 
the tube content and the efficiency of the enzyme would obscure the results. As a result of these considera- 
tions, the operations Separation(Tl,X,T2) and Annealing (T) presented in §2.2 function with questionable 
success when applied to a complex test tube, especially when recursion is used. 

Therefore, in order for matrix multiplication under the Adleman-Lipton model to be completely reliable 
in practice and the aforementioned problems circumvented, these streptavidin based operations must be 
improved upon. That way, the parallelization offered by DNA can be utilized as an important mathematical 
tool with performance capabilities comparable to the electronics. One way we propose to overcome this po- 
tential setback of ligation is to use a modified ligation procedure that can handle longer molecules in place of 
the original, termed ligation by selection presented in 13 . Ligation by selection (LBS) is a method to ligate 
multiple adjacent DNA fragments that does not require intermediate fragment isolation and is amenable 
to parallel processing, therefore reducing the obfuscation of the results by the complexity of tube content. 
Essentially in LBS, fragments that are adjacent to each other are cloned into plasmid markers that have a 
common antibiotic marker, a linking restriction site for joining the fragments, a linking restriction site on 
the vector, and each vector has a unique site to be used for restriction-purification and a unique antibiotic 
marker. The method is applied to efficiently stitch multiple synthetic DNA fragments of 500-800 bp together 
to produce segments of up to 6000 bp in [13] . For a cogent and complete explanation of ligation by selection 
we refer the reader to [13] . 

To utilize LBS recursively, the alteration of resistance markers and restriction-purification sites of acceptor 
and donor vectors that occur in each LBS cycle must be accounted for in order to minimize the number of 
cycles required in parallel processing. As opposed to conventional ligation, the advantages that LBS has are 

m 

• The avoidance of the need to isolate, purify, and ligate individual fragments 

• The evasion of the need for specialized MCS linkers 

• And most importantly, the ease with which parallel processing of operations may be applied 

Hence, in order for the Adleman-Lipton model to be more relaible in the recursive operations our implemen- 
tation of Strassens algorithm requires, we replace the ligation procedure of §2.2 with LBS. 



3. DNA Matrix Operations in RNS 

3.1. DNA Representation of a Matrix in RNS. We extend the DNA representation of integers in RNS 
presented in [17] to representing an entire matrix Y in RNS by way of single DNA strands. Let matrix Y 
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be a t x t matrix with: 



2/21 



Y 



yi2 

2/22 



2/2t 



yti yt2 



ytt 



The key here is the RNS representation of each element y qr in the hypothetical matrix Y with 1 < q < t 
and 1 < r < t by way of DNA strands. 

We first utilize the improved DNA representation of n binary numbers with m binary bits as described 
in [17] for the alphabet J^: 

= {A l ,B :i ,C o ,Ci,Eo,E 1 ,Do,D 1 ,l,0, #|0 < i < M - 1,0 < j < m}. 

Here, Ai indicates the address of M integers in RNS; Bj denotes the binary bit position; Co, Ci, Eq, Ei, 
Dq, and D\ are used in the Cleavage operation; jf= is used in the Separation operation; and and 1 are 
binary numbers. Thus, in the residue digit position, the value of the bit y qr with a bit address of i and a bit 
position of j can be represented by a single DNA strand (Sij)y qr : 

(3.1.1) (Si,j) q r = (D 1 B ] E E 1 A i C C 1 VD )y qr , 

for V G {0, 1}. Hence, the matrix Y can be represented as such: 



Y : 



/(D 1 B j E E 1 A i C C 1 VD ) vll (D 1 B j E E 1 A i C C 1 VD ) yi2 
(D 1 B J E E 1 A i C C 1 VD )y 21 (D 1 B j E E 1 AiCoC 1 VD )y 22 



\(D 1 B j E E 1 A i C C 1 VD )y tl (D^EoEUiCod VD \ 



(D 1 B j E E 1 A i CoC 1 VDo)y lt -\ 
(D 1 B j EoE 1 A i C C 1 VD )y 2t 



(D 1 BjEoE 1 A i C C 1 VDo)y u J 



where each strand-element is not necessarily distinct. The reader must keep in mind that M integers in 
RNS defined by the n-moduli set P can be represented by 2M(m + 1) different memory strands, whereas in 



the binary system, the respresentation of M integers requires 2M ( 1 + X) 



different memory strands. 



3.2. Residue Number Arithmetic with Matrices. From (|2.1.ip . it is apparent that the operation o is 
carry-free, thereby allowing for the employment of parallel procedures in all residue digits. In |17j two prop- 
erties are given for the modular operation involving two integers x — > (ir n _i, ■ • ■ , xq) and y — > (y n -i, • • ■ , yo) 
in RNS defined by the set P = {2 m —i, 2 m — 2 - 1, • • • , 2 m ° - 1}. 

Lemma 3.2.1. For Vj ; m„_i € N, if j < m„_i then |2 J '| 2m „-i = V else |2 J | 2 ™„-i = 0. 

Lemma 3.2.2. For I = 0, • • • , n — 2, let xi + yi = Zi where zi — (z/( m( ), • • • ,z/o)- If z i > 2 mi — 1, then 

\z l \ 2 ^- 1 = i + ET=o 1 ^ 23 - 

Next, the procedures RNSAdd and RNSDiff add and subtract two integers in RNS defined by the moduli 
set P, respectively. The pseudocode for RNSAdd and RNSDiff is given in §4.4 of [17], and we refer the 
reader to that source (note that the pseudocode of [17] for both algorithms utilizes the operations presented 
in §2.2 extensively). Instead, we provide some background on the two procedures. The inputs are In tubes 
T^ qr and Tf qr (for I = 0, • • • , n — 1) containing the memory strands representing the elements x qr and y qr 
of t x t matrices X and Y, respectively. Once either operation is complete, it returns n tubes T l Rsum and 
rpRdiff con ^ amm g t ne result of residue addition or subtraction, respectively. We also use the following n 
temporary tubes for RNSAdd, namely, Tj? emp , T l sum , and T l sum ,. Similarly for RNSDiff, the n temporary 
tubes, T l temp , T l diff , and T l d iff , ar e used. 

Thus, based on Lemma 13 2.11 and Lemma I3.2.2[ we introduce the following two algorithms for matrix 
addition and subtraction in RNS which will be used when dealing with the block matrices in Strassen's 
algorithm. For the sake of example, we are adding (and subtracting) the hypothetical t x t matrices X 
and Y. Essentially, the RNSMatrixAdd and RNSMatrixDiff algorithms employ RNSAdd and RNSDiff in a 
nested FOR loop. 
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3.2.3. Matrix Addition. The procedure RNSMatrixAdd is defined as: 
Algorithm 3.1: RNSMAtmxAdd(Tx, Ty) 

for q 4— to t 
do 

{for r 4— to t 
do 
{RNSAdd(T^V ■ ■ ,To qr ,T^_\, ■ ■ ■ ,7%")', 



3.2.4. Matrix Subtraction. The procedure RNSMatrixDiff is defined as: 
Algorithm 3.2: RNSMatrixDiff(T x , T Y ) 

for q 4— to t 
do 

( for r 4- to t 
J do 



|RNSDiff(r^ r 1 , 



7l-l> 



4. Strassen's Algorithm Revisited 

4.1. Bottom-Level Matrix Multiplication. Although a vast repository of traditional matrix multiplica- 
tion algorithms can be used between processors (test tubes containing memory strands; however for the sake 
of brevity, we shall just use the term "memory strand" or "strand"), we will employ the Cannon algorithm 
|10) since it can be used on matrices of any dimensions. We will only discuss square strand arrangments 
and square matrices for simplicity's sake. Assume that we have p 2 memory strands, organized in a logical 
sequence in a p x p mesh. For i > and j < p— 1, the strand in the i th row and j th column has coordinates 
(i,j). The matrices X, Y, and their matrix product Q are of size txt, and again as a simplifying assumption, 
let t be a multiple of p. All matrices will be partitioned into px p blocks of s x s sub-matrices where s = t/p. 
As described in [3] , the mesh can be percieved as an amalgamation of rings of memory strands in both the 
horizontal and vertical directions (opposite sides of the mesh are linked with a torus interconnection). A 
successful DNA implementation of Cannon's algorithm requires communication between the strands of each 
ring in the mesh where the blocks of matrix X are passed in parallel to the left along the horizontal rings 
and the blocks of the matrix Y are passed to the top along the vertical rings. Let Xy, Yy, and Qij denote 
the blocks of X, Y, and Q stored in the strand with coordinates (i,j). The Cannon algorithm on a DNA 
computer can be described as such: 

Algorithm 4.1: CANNON(Tx ij - , T Yij ) 

for i column <— to i 
do 

{LeftShift(T XlJ ) 
for j th column 4— to j 
do 

{UpShift(3Vy ) 

V strands 
do 

{Value Assignment (Tx^-yy , Tq.. ) 

do (p — 1) times 
LeflShift(Tx y ) 
UpShiftpYy) 

ValueAssignment (T RNSMatrixAdd{TQ ^ iTx . y _ j , T Qij ) 
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Note that the procedure UpShift can be derived from Zheng et al.'s [T7] LeftShift. Now we examine the 
run-time of the Cannon algorithm. The run time can be componentized into the communication time and 
the computation time, and the total communication time is 

2B8t 2 

(4.1.1) 2pa + — , 



and the computation time is 

2tH, 



(4.1.2) 



comp 



P 2 



where t comp is the execution time for one arithmetic operation, a is the latency, /? is the sequence-transfer 
rate, the total latency is 2pa, and the total sequence-transfer time is 2p(3B(m/p) 2 with B as the number of 
sequences to store one entry of the matrices. According to [3J, the running time is 

(4.1.3) T (t)= 2t3 \ omp +2pa + ^-. 

p z p 



4.2. Matrix Storage Pattern. The primary difficulty is to be able to store the different sub-matrices of 
the Strassen algorithm in different strands, and these sub-matrices must be copied or moved to appropriate 
strands if tasks are spawned. Hence, we present here a storage map of sub-matrices to strands based on 
the result of Luo and Drake [IT] for electronic computers. Essentially, if we allow each strand to have a 
portion of each sub-matrix at each resursion level, then we can make it possible for all strands to act as one 
strand. As a result, the addition and subtraction of the block matrices performed in the Strassen algorithm 
at all recursion levels can be performed in parallel without any inter-strand communication [3] . Each strand 
performs its local sub-matrix additions and subtractions in RNS (via RNS Matrix Add and RNSMatrixDiff 
described in §3). At the final recursion level, the block matrix multiplications are calculated using the 
Cannon algorithm in §4.1. 

For instance, if we suppose that the recursion level in the Strassen-algorithm is r, and let n — t/p, to = t/2, 
and no = to/p for n, to, no G N, then the run-time of the Strassen-Canon algorithm is: 

(4.2.1) T{t) = 18T add (t) +7T f ' 



2 \2 



where T a dd (5) is the run-time to add or subtract block matrices of order t/2. 
Additionally, according to (9) of [3J, 



9(l) r t 3 t <~\(lYi /7 

\8/ b comp L comp I I 

p 2 p 2 \ 4 



(4.2.2) T t « V6 ' + ; co "^ + (l) 2 pa. 



Since the asymptotically significant term p a c ° mp decreases as the recursion level r increases, then for 
t significantly large, the Strassen-Cannon algorithm should be faster than the Cannon algorithm. Even if 
the Cannon algorithm is replaced at the bottom level by other parallel matrix multiplication algorithms, the 
same result holds. 



4.3. Recursion Removal. As has been previously discussed, in order to use the Strassen algorithm between 
strands (at the top level), we must determine the sub-matrices after r times recursive execution and then 
to determine the resultant matrix from these sub-matrices. Nguyen et al. [3j recently presented a method 
on electronic computers to ascertain all of the nodes in the execution tree of the Strassen algorithm at 
the unspecified recursion level r and to determine the relation between the sub-matrices and the resultant 
matrix at level r. We extend it to the DNA computing paradigm. At each step, the algorithm will execute 
a multiplication between 2 factors, namely the linear combinations of the elements of the matrices X and 
Y, respectively. Since we can consider that each factor is the sum of all elements from each matrix, with 
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coefficient of 0, -1, or 1 |3j, then we can represent these coefficients with the RNS representation of numbers 
with DNA strands described in §3.1 as such: 

({DxBtEoExAaCodODo, D^EoE^CoC^Do}, {D^E^AoCodODo, D^E^AoCodODo}, 

{D 1 B l E Q E 1 A (s C (s C l QD Q ,D 1 B E Q E 1 A (s C (s C l QD }), 

{{D 1 B 1 E E 1 A^ l C C 1 lD ,D 1 B E E 1 A_ 1 CoC 1 lD },{D 1 B 1 E EU-iC C 1 lDo,D 1 B E E 1 A^ 1 CoC 1 ^^ 

{D x B x E ExA-iC CxlD ,D x B E Q ExA- X C CxlD }), 

or 

({D 1 B 1 E E 1 A 1 C C 1 OD ,D 1 B E E 1 A 1 C C 1 1D },{D 1 B 1 E E 1 A 1 C C 1 OD ,D 1 B E E 1 A 1 C C 1 1D }, 

{D^EoE^CoCrfDo.D^oEoE^CoC^Do}), 

respectively. For the sake of brevity, we shall denote the latter three equations as (O)fljvs', ( — 1)_rjvSi and 
(^)rns, respectively. This coefficient is obtained for each element in each recursive call and is dependent 
upon both the index of the call and the location of an element in the division of the matrix by 4 sub- 
matrices (3J- If we view the Strassen-Cannon algorithm's execution as an execution tree [3J, then each scalar 
multiplication is correlated on a leaf of the execution tree and the path from the root to the leaf represents the 
recursive calls leading to the corresponding multiplication. Furthermore, at the leaf, the coefficient of each 
element (either (0)rns, {—l)rns, or (1)rns) can be determined by the combination of all computations in 
the path from the root. The reason is that since all of the computations are linear, they can be combined in 
the leaf (which we will denote by ti). 

Utilizing the nomenclature of [3J, Strassen's formula can be depicted as such: 
For I = • • • 6, 

(4-3.1) U= >;,SXil.;.j. x VijSY{l,i,j), 

i,j=0,l i,j=0,l 

and 

6 

(4.3.2) qij =J2tlSQ(hi,j), 

1=0 

in which 



SX = 





00 


01 


10 


11 





Wrns 


(Q)rns 


(0)rns 


{0)rns 


1 


(0)rns 




(0)rns 


{0)rns 


2 


(0)rns 


(Q)rns 


(l)i?WS 


Wrns 


3 


(-l)flJVS 


(Q)rns 


(^)rns 


Wrns 


4 


(1)kns 


(0)rns 


(-I)jws 


{0)rns 


5 


(0)rns 


(O)rns 


(^)rns 


(1) RNS 


6 


{0)rns 


(0)rns 


(0)rns 


(1)rNS 



SY = 





00 


01 


10 


11 





(^)rns 


(0)rns 


(0)rns 


(O)rns 


1 


(0)rns 


(0)rns 


Wrns 


{0)rns 


2 


(~1)rns 


{1)rns 


{0)rns 


(0)rns 


3 


(l)rns 


(—1)rns 


{0)rns 


(l)i?7VS 


4 


(0)rns 


(—1)rns 


{0)rns 


(l)-RJVS 


5 


(0)rns 


(l)flJVS 


(0)rns 


Wrns 


6 


(-l)flJVS 


(l)flJVS 


Wrns 


(—1)rns 
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SQ 



At recursion level r, ti can be represented as such: 
For I = 0---7 k - 1, 



1\ ii 


nn 
uu 


ni 


i n 

1U 


1 1 
± ± 





Wrns 


(I)hjvs 


(l)-RJVS 


(^)rns 


1 


(^)rns 




(0)_R7VS 


(0)rns 


2 


(0)rns 


(l)flJVS 


(0)rns 


(0)rns 


3 


(0)rns 


(l)flJVS 


Wrns 


(^)rns 


4 


{0)rns 




{0)rns 


(^)rns 


5 


(0)rns 


(1)hjvs 


(0)rns 


(0)rns 


6 


(O)rns 




{0)rns 


(1)rns 



(4.3.3) 

and 

(4.3.4) 



U= Y XijSXk(l,i,j)x Y y tJ SY k (l,i,j), 

ij—n—l i,j—0,n — l 



1=0 



It is easy to see that SX = SX\, SY = SY\, and SQ = SQi; however, the difficulty that arises is to 
determine the values of matrices SX^, SY k , and SQ k in order to have a general algorithm. The following 
relations were proved in [4], and we shall prove that these results hold with DNA: 



(4.3.5) 



(4.3.6) 



(4.3.7) 



SX k (l,i,j) = Y[ SX(l r ,i r ,j r ), 

r=l 
k 

SY k (l,i,j) = Y[ SY(l r ,i r ,j r ), 

r=l 
k 

SQk(l,i,j) = Y[ SQ(l r ,i r ,j r )- 



First we shall extend the definition of the tensor product for arrays of arbitrary dimensions 0] by representing 
the tensor product in RNS by way of single DNA strands. 

Proposition 4.3.1. Let A and B be arrays of the same dimension I and of size m\ x m 2 X • • • x mi and 

n\ x ri2 x ••• x ni, respectively. The elements of A and B are represented using RNA by way of DNA 
strands as presented in detail in §5.1. The tensor product can thus be described as an array of the same 
dimension and of size mini x mini x • • • x mini in which each element of A is replaced with the product 
of the element and B. This product can be computed with the algorithm RNS Mult which is recognized by 
a serial of operations of the RNS Add algorithm detailed in $4-4 °f Zheng et al. [17] . P — A ® B where 
P[i\i *2, • • • i h] — A[ki, k%, • • • , ki]B[hi, h 2 , • • • , hi]. 1 < Vj < I, ij = kjnj + hj (kjTij and hj will be added 
with RNS Add). 



If we let P = <£)? =1 Ai — (■ ■ ■ (Ai Cg> A2) <8> A3) ■ ■ ■ <g> A n ) where A, is an array of dimension I and of size 
ma x mi2 x ■ • • x ma 1 the following theorem allows us to directly compute the elements of P. All products 
and sums of elements can be computed with RNSMult and RNSAdd, respectively. 

Theorem 4.3.2. If we let j k = YZ=\ ( h sk Il"= s +i m rk) , then P[ji,j 2 , •■• ,ji]= 11"= 1 A AhiU fk2, •■■ , h a ]. 

Proof. We give a proof by induction. For n = 1 and n = 2, the statement is true. Assume it is true with n, 
then wc shall prove that it is true with n + 1. 
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P n +i [n, v 2 , ■ ■ ■ , vi] = A i [hi ,h i2 ,--- , h u ] where 



n+l / n+l \ 



s—l \ r— s+1 



for 1 < Vfc < I. Hence, P n +\ — P n ® A n+ i. 
Furthermore, by definition, 



n+l 



Pn+l\jl, 32, ■■■ =P n \Pl,P2, ■■■ ,Pl]A n+ i[h( n+1) ,h 2 ( n+ i), - ■ ■ ,/li(n+l)]= JJ [ha , h i2 , ■ ■ ■ ,hu], 

i=l 

where 

n / n+l \ n+l / n+l \ 

jfc = E /l S fe ]"[ "Vfc + /lfe(n+l) = ^ I ^afe J][ "Vfc • 
s=l \ r=s+l / s=l \ r=s+l / 

■ 

Theorem 4.3.3. SA fe = O^^X, SYfc = ® k i=1 SY , and SQ k = ® k =1 SQ. 

Proof. We give a proof by induction. For k = 1, the statement is true. Assume it is true with k, then we 
shall prove that it is true with k + 1 . 

According to (|4.3.3[) and (|4.3.4p . at level k + 1 of the execution tree, for < I < 7 k+1 - 1 

Ti=i J2 X k+1>ij SX k+1 (l,i,j) ) x ^ y fe+Mj -5Y fc+1 (Z,i,j) 

\i>0,i<2'= + 1 -l / \i>0J<2 fc + 1 -l 

It follows from (|IXTj) and (|4~3T2"j) that at level k + 2, for < I < 7 k+1 - 1 and < I' < 6 

i'>0,j'<l \i>0,7<2 fc + 1 -l 

(4.3.8) - - V - J " 

E f E ^ +1 , ii [i' ! i']5y &+1 G,i ! i)5y(r ! i' ! i')] , 

i'>0j'<l \i>0,j<2 fc + 1 -l / 

where X^+i,^ j'] and Y k +\^j [i 1 , f] are 2 fc+2 x 2 fe+2 matrices obtained by partitioning the matrices X k+ i yi j 
and Yfe+i.ij into 4 sub-matrices (we use i' and j' to denote the sub-matrix's quarter). 

We represent 1,1' in base 7 RNS, and in base 2 RNS. Since Afe + i :i j [i' , j'] = X k +2,ij [H'n 33 2 \ tnen 

for < W {7) < 7 k+1 - 1, 

M[W {7) }= Yl X k+2 \W i2) ,jf(2)]SX k+1 (l,iJ)SX(l',i',j')\x 

\.^ 7 (2)>0,TJ 7 (2) <2 fc + 1 -l 

(4.3.9) V 

E n-+2F(2),F ( 2)]^+ia,^j)^'^y) | • 

\i* r (2)>0,W {2) <2 k + 1 -l 
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Moreover, it directly follows from (14.3.31) and (14.3.41) that for < IV {7) < 7 k+1 - 1, 

/ 



M[IF (7) ] = 



\"'(2)>0JJ' (2) <2'= + 1-1 



X X k+2 [ii' (2), jj' (2) ]SX k+2 (ll'(7),ii'(2),j3'(2)) 



(4.3.10) 

X Y k+2 [ii' (2)-, H' (2)]SY k+ 2 (ll'(7),H'(2),3? 

» 7 (2)>0,i7 7 (2) <2fc + i-l 

From (|4.3.12p and (|4.3.10j) . we have 

sx k+2 (T%,i%,M) = sx k+1 (i,i,j)sx(i',i'f), 

and 

SY k+2 (W 7 ,W 2 ,]jf) = SY k+1 (l,i,j)SY(l',i'j'). 

Thus, 

SX k+2 = SX k+1 ®SX= ®-+?SX, 
SY k+2 = SY k+1 ®SY = ®*+?SY, 

and 

SQ k+2 = SQ k+1 ®SQ = ® k t ^SQ. 



(2) 



From Theorem and Theorem \4~3M (|4X5]h (f47376|) . and (jOT)) follow. 
As a consequence of ()4.3.3j) - (|4.3.7|) . we can form the following sub-matrices: 



(4.3.11) 



Ti= X * \Y[SX(l u ,i u ,j u ) x J2 Y iAl[SX{l u ,i u J u ) 



i,j=0,2 r -l \u=l 



i,j=0,2 r -l \u=l 
Z=0---7 r -l 



As a result of the storage map of sub-matrices to strands presented in §4.2, the following sub-matrices can be 
locally determined within each strand, and their product Ti can be computed by the DNA implementation 
of the Cannon algorithm presented in §4.1: 



y\_ sx o>u, iuju) 



i i=0,2 r -l \m=1 
\j=0,2 r -l 



and 



Ml till ]v 



. i=0.2 r -l \u=l 
\j=0,2 r -l 



J 



All of the sub-matrices are added with the RNS Matrix Add algorithm presented in §3.2.3. 

Lastly, it is important to note that due to (I4.3.3l) - (|4.3.7p . we have derived a method to directly compute the 
sub-matrix elements of the resultant matrix via the application of matrix additions (using the RNSMatrixAdd 
algorithm of §3.2.3) instead of backtracking manually down the recursive execution tree to compute: 



7 r -l 



7 r -l 



(4.3.12) 



1=0 



1=0 \u=l 



12 



ARAN NAYEBI 



5. Conclusion 

Our general scalable implementation can be used for all of the matrix multiplication algorithms that use 
fast matrix multiplication algorithms at the top level (between strands) on a DNA computer. Moreover, 
since the computational complexity of these algorithms decreases when the recursion level r increases, we can 
now find optimal algorithms for all particular cases. Of course, as mentioned previously in this paper, the 
current science of DNA computing does not guarantee a perfect implementation of the Strassen algorithm 
as described herein; for now, these results should be regarded as primarily theoretical in nature. 
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